Picture for Boxi Cao

Boxi Cao

Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination

Add code
May 29, 2026
Viaarxiv icon

LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents

Add code
May 28, 2026
Viaarxiv icon

MetaphorVU: Towards Metaphorical Video Understanding

Add code
May 25, 2026
Viaarxiv icon

Learning from Failures: Correction-Oriented Policy Optimization with Verifiable Rewards

Add code
May 14, 2026
Viaarxiv icon

All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG

Add code
Apr 22, 2026
Viaarxiv icon

Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces

Add code
Apr 09, 2026
Viaarxiv icon

P^2O: Joint Policy and Prompt Optimization

Add code
Mar 23, 2026
Viaarxiv icon

Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards

Add code
Mar 10, 2026
Viaarxiv icon

Memorizing is Not Enough: Deep Knowledge Injection Through Reasoning

Add code
Apr 01, 2025
Figure 1 for Memorizing is Not Enough: Deep Knowledge Injection Through Reasoning
Figure 2 for Memorizing is Not Enough: Deep Knowledge Injection Through Reasoning
Figure 3 for Memorizing is Not Enough: Deep Knowledge Injection Through Reasoning
Figure 4 for Memorizing is Not Enough: Deep Knowledge Injection Through Reasoning
Viaarxiv icon

Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering

Add code
Nov 18, 2024
Figure 1 for Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Figure 2 for Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Figure 3 for Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Figure 4 for Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Viaarxiv icon